Skip to content

Conversation

@andreaTP
Copy link
Collaborator

@andreaTP andreaTP commented Sep 5, 2024

Alternative ( to #514 ) fix for #513

I think this is simpler and, at the end of the day, more easily maintainable (fun fact: I have seen multiple ppl stumbling on the .tsv as IntelliJ was not respecting the tabs 😅 ).

@jmillikin
Copy link

(author of #513 / #514 here)

I don't have a strong preference either way -- if this PR gets merged I'll close mine.

If you expect to add more instructions over time (for example when adding support for SIMD or multi-memory) then keeping the table might be better. Java is not generally considered a terse language, so having a bit of codegen can be useful for making the programmer's intent clear.

fun fact: I have seen multiple ppl stumbling on the .tsv as IntelliJ was not respecting the tabs 😅

I was definitely surprised to see TSV used here. Similar tables I've seen in other projects treat tabs and spaces as just plain whitespace, which tends to survive editor round-trips better. For an example see the Linux kernel's arch/arm/tools/syscall.tbl. I've also seen people use structured formats such as lines of JSON, which can be nice when items are heterogenous.

Regardless of which approach is used, thank you for taking the time. I'll answer your questions in the other PR in case that helps with the decision.

Copy link
Collaborator

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like getting rid of the code generation for this. Adding new instructions here is very minor compared to the rest of the implementation.

for (OpCode e : OpCode.values()) {
byOpCode.put(e.opcode(), e);
}
signature.put(UNREACHABLE, new WasmEncoding[] {});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since each opcode has an encoding signature, we can simplify this and make it less error prone by moving the signature to the enum constructor (can static import the encoding constants):

UNREACHABLE(0x00, List.of()),
NOP(0x01, List.of()),
BLOCK(0x02, List.of(VARUINT)),
LOOP(0x03, List.of(VARUINT)),
BR_TABLE(0x0E, List.of(VEC_VARUINT, VARUINT)),

The constructor call would be simpler if we use varargs for the encodings, but being explicit about the signature seems better.

return byOpCode.get(opcode);
}

private static final Map<OpCode, WasmEncoding[]> signature = new HashMap<>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change this to Map<OpCode, List<WasmEncoding>> so that it is immutable


static {
for (OpCode e : OpCode.values()) {
byOpCode.put(e.opcode(), e);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want a validation here

if (byOpCode.put(e.opcode(), e) != null) {
    throw new AssertionError("duplicate opcode: " + e.opcode());
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the Map to be an array, I see the value of the check, but is, ultimately, only for us developers of the engine.
I left it out for now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, and you would find the error immediately anyway, so it's not needed.

@andreaTP andreaTP force-pushed the rem-wasm-support-plugin branch from 0edb29d to a8f3ffc Compare September 6, 2024 10:19
@andreaTP
Copy link
Collaborator Author

andreaTP commented Sep 6, 2024

@electrum moved to a fully hand rolled implementation, let me know what you think about this!

Copy link
Collaborator

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

I don't know if an array is needed here. With 8 byte references, each array is 512kB, so this adds a 1MB memory overhead. Does this provide a measurable improvement when parsing WASM files?

If there is no improvement and the memory overhead is a problem for someone, we can switch back to a hash map later.

// System.out.println("b: " + b + " op: " + op);
var signature = OpCode.getSignature(op);
var signature = OpCode.signature(op);
// TODO: Encode this in instructions.tsv ?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove outdated comment

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@andreaTP andreaTP force-pushed the rem-wasm-support-plugin branch from a8f3ffc to 3527e6e Compare September 9, 2024 09:13
@andreaTP andreaTP merged commit afa3860 into dylibso:main Sep 9, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants